A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++

نویسنده

  • Dennis Wei
چکیده

This paper studies the k-means++ algorithm for clustering as well as the class ofD sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by D sampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known O(log k) guarantee for the case β = 1 to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bi-Criteria Approximation Algorithm for k-Means

We consider the classical k-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output βk > k clusters, and must produce a clustering with cost at most α times the to the cost of the optimal set of k clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to...

متن کامل

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

Cut Problems in Graphs with a Budget Constraint

We study budgeted variants of classical cut problems: the Multiway Cut problem, the Multicut problem, and the k-Cut problem, and provide approximation algorithms for these problems. Specifically, for the budgeted multiway cut and the k-cut problems we provide constant factor approximation algorithms. We show that the budgeted multicut problem is at least as hard to approximate as the sparsest c...

متن کامل

Approximation Algorithm for the Max k-CSP Problem

We present a ck 2k approximation algorithm for the Max k-CSP problem (where c > 0.44 is an absolute constant). This result improves the previously best known algorithm by Hast, which has an approximation guarantee of Ω( k 2k log k ). Our approximation guarantee matches the upper bound of Samorodnitsky and Trevisan up to a constant factor (their result assumes the Unique Games Conjecture).

متن کامل

Scalable constant k-means approximation via heuristics on well-clusterable data

We present a simple heuristic clustering procedure, with running time independent of the data size, that combines random sampling with Single-Linkage (Kruskal’s algorithm), and show that with sufficient probability, it has a constant approximation guarantee with respect to the optimal k-means cost, provided an optimal solution satisfies a center-separability assumption. As the separation increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016